10 research outputs found

    Domain-specific language models and lexicons for tagging

    Get PDF
    AbstractAccurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost

    Anni R. Coden, Ph.D.

    No full text
    Anni R. Coden, Ph.D. is currently the project manager and technical lead of the IBM Systems G Anomaly Detection Solution. Previously she led a project at IBM’s T.J. Watson Research Center on Modeling and Simulation in a Smarter Cities environment, with a focus on Emergency Response Management. Anni also managed the Medical Text and Image Analysis group. The team had a long-term collaboration with the Mayo Clinic, worked with the Memorial Sloan-Kettering Cancer Center and was also involved with academic research. Anni Coden joined IBM in 1981. Previously, she was a Researcher at the Massachusetts Institute of Technology from where she received her Ph.D. and M.S. in Computer Science. The received her M.S. in electrical engineering and her B.S. in mathematics from the Vienna University of Technology (Austria). Anni Coden has published in many areas such as theoretical computer science, computer vision, and computational linguistics and is the holder of multiple patents.https://commons.erau.edu/adfsl-bios/1000/thumbnail.jp

    Morning Session 1- Keynote Speaker: Anni R. Coden

    No full text

    Automatic Search from Streaming Data

    No full text
    LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. Ithas been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P

    Domain-specific language models and lexicons for tagging

    No full text
    been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P